-
Notifications
You must be signed in to change notification settings - Fork 3
Implement non-blocking model loading with health state transition #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
64f3073 to
6962fc0
Compare
d00deef to
15ea366
Compare
…tach frame processor to server health state
15ea366 to
2b17af1
Compare
…c registration. Add route_prefix
examples/loading_overlay.py
Outdated
| return frame | ||
|
|
||
| # Create loading overlay frame using utility (RGB format to match tensor expectations) | ||
| loading_frame = create_loading_frame(width, height, loading_message, frame_counter, color_format="RGB") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this internally so that if the pipeline is not ready yet it displays the default overlay, if we want the user to modify the loading we can add maybe modify_loading_frames that the user can import and modify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will move this into frame_processor
|
Tested the changes with the decorated |
…ing overlay logic in video processing - move ErrorCallback to frame_processor to fix editable install
…rocessing - Implemented clear_input_queues method in TrickleClient to clear pending frames from video and audio input queues. - Updated StreamServer to call clear_input_queues before updating parameters, ensuring stale frames are not processed. - Enhanced logging for model loading and warmup sequences in FrameProcessor
…lated components - Added a warmup decorator for handler functions in decorators.py. - Enhanced FrameProcessor to automatically trigger warmup after model loading. - Introduced WarmupProtocol for defining warmup handler signatures in registry.py. - Updated StreamProcessor to support warmup handlers and manage warmup sequences. - Improved logging for warmup processes to ensure better traceability.
|
|
||
| # Handle warmup sentinel message (extract it with pop) | ||
| warmup_params = params.pop("warmup", None) | ||
| if warmup_params is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this trigger on all model loads as well?
| to the user's param_updater callback. | ||
| """ | ||
| # Handle model loading sentinel message (extract it with pop) | ||
| if params.pop("load_model", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the workflow here? Is this what would be an indication the Client thinks this will require a significant update to the pipeline that will take some time?
If it is, what do you think of using reload_pipeline? This could possibly be loading a lora, changing pipeline configs only available at load time, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea of reload_pipeline. We can implement it as a managed state control OK/IDLE -> LOADING -> OK/IDLE which runs load_model and warmup via prompt update. pytrickle is sort of "model unaware" so I don't think we should do anything like force free gpu memory. IMHO it should be up to the consuming app to track and manage state of their model and to handle callbacks to load_model when their model is already loaded into memory appropriately. wdyt? @ad-astra-video
ComfyStream is unique in this because it automatically loads/unloads VRAM based on workflow changes.
pytrickle/stream_handler.py
Outdated
| from typing import Optional, Dict, Any | ||
|
|
||
| from . import ErrorCallback | ||
| from .base import ErrorCallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document in the PR why the ErrorCallback had to move. How you found the issue and tested it would help too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing and found this change unnecessary, so it's been reverted back in 5a1cb95
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytrickle/state.py
Outdated
| logger.info("State transition: LOADING → IDLE") | ||
| self.set_state(PipelineState.IDLE) | ||
| else: | ||
| logger.info(f"State already {self._state.name}, not transitioning") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like should be debug level but will defer to your judgement if provides value in development with experience working this into comfystream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is an excess debug log
pytrickle/server.py
Outdated
| }, status=400) | ||
|
|
||
| # Clear input queues before updating to avoid processing stale frames | ||
| await self.current_client.clear_input_queues() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I think this should be a setting in the update params to allow for turning this off. I don't know that every update to params we should drop all frames in the input queue.
…ter processing - Consolidated warmup logic into a new trigger_warmup method for better clarity and reuse. - Updated _InternalFrameProcessor to handle control sentinels for model loading and warmup more efficiently.
This pull request introduces a comprehensive example for model loading and loading overlays, adds new configuration and handler support for warmup routines, and improves the codebase with better queue management and testing configuration. The most important changes are grouped below:
New Example and Documentation Updates
examples/loading_overlay.pyexample demonstrating non-blocking model loading, health state transitions, animated loading overlays, and real-time parameter updates. This helps users understand how to implement loading overlays and manage model load states.README.mdto reference the new example files for green tint and model loading overlays, improving documentation accuracy.Warmup Handler and Configuration Support
WarmupConfigandWarmupModeto the public API, and introduced awarmupdecorator inpytrickle/decorators.pyfor registering warmup handlers. This enables explicit warmup routines for models and resources. [1] [2]FrameProcessorto support warmup configuration, warmup state management, and a thread-safe model loading process that includes warmup execution. This ensures models and overlays are properly initialized before serving requests.Queue Management and Testing Improvements
clear_input_queuesmethod toTrickleClientto clear stale video and audio frames when parameters change, preventing outdated frames from being processed with new settings..vscode/settings.jsonfor easier test discovery and execution.API and Import Cleanups
ErrorCallbacktopytrickle/base.py, updated related imports across files, and exposed new API functions such asbuild_loading_overlay_frame. [1] [2] [3] [4] [5]Development Environment